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Abstract: Diagnosis of malaria must be rapid, accurate, simple to use, 
portable and low cost, as suggested by the World Health Organization 
(WHO). Despite recent efforts, the gold standard remains the light 
microscopy of a stained blood film. This method can detect low parasitemia 
and identify different species of Plasmodium. However, it is time 
consuming, it requires well trained microscopist and good instrumentation 
to minimize misinterpretation, thus the costs are considerable. Moreover, 
the equipment cannot be easily transported and installed. In this paper we 
propose a new technique named "secondary speckle sensing microscopy" 
(S 3 M) based upon extraction of correlation based statistics of speckle 
patterns generated while illuminating red blood cells with a laser and 
inspecting them under a microscope. Then, using fuzzy logic ruling and 
principle component analysis, good quality of separation between healthy 
and infected red blood cells was demonstrated in preliminary experiments. 
The proposed technique can be used for automated high rate detection of 
malaria infected red blood cells. 

© 2012 Optical Society of America 

OCIS codes: (170.4580) Optical diagnostics for medicine; (170.3880) Medical and biological 
imaging; (170.1530) Cell analysis; (170.0180) Microscopy; (120.6160) Speckle interferometry; 
(170.1470) Blood or tissue constituent monitoring; (170.6480) Spectroscopy, speckle. 
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1. Introduction 

Malaria is an infectious disease caused by the bite of a female anopheles mosquito infected by 
Plasmodium. Every year, 243 million new cases are reported by the World Health 
Organization (WHO) with almost a million deaths, mostly of African children. Due to the 
nonspecific symptoms (fever) and the absence of a rapid and efficient diagnostic tool, 
presumptive antimalarial treatment is often preferred. This increases the risk of mortality due 
to inappropriate therapy and favors the emergence of drug resistance. Thus, new tools for a 
prompt and accurate malaria diagnosis are urgently needed [1]. 

The ideal diagnostic tool for malaria in endemic countries must be rapid, accurate, simple 
to use, low cost, and easily interpretable. The gold standard in malaria diagnosis is the 
microscopic Giemsa-stained blood smear (MSB), which remains the only method allowing 
detection with high sensitivity and specificity. Low parasitemia (0.0001%, i.e., 1 infected cell 
over 10 6 cells) detection and identification of the parasite species (Plasmodium falciparum, 
Plasmodium vivax, Plasmodium malariae, Plasmodium ovale) are thus possible [2,3]. 

However, misinterpretation (artifacts mistaken for malaria parasite such as fungi, bacteria, 
or cell debris) can commonly occur in poor setting laboratories with low quality microscopes 
where only the experience of a well-trained microscopist can reduce the errors [2,4]. The time 
to diagnosis is about 8-10 hours in African medical centers [2]. The cost of equipment and 
training is considerable, even if the apparent cost for an individual sample examination is 
relatively low. Moreover, the equipment cannot be easily transported and installed. An 
improvement of diagnostic tools is in this context highly demanded. New innovative 
technologies could be used to enhance the accuracy while reducing time, complexity and cost 
of actual diagnosis. 

Malaria symptoms (fever, swear etc.) are associated with the asexual intraerythrocytic life 
cycle of the parasite. After the initial replication inside the hepatocytes, the merozoites invade 
the host red blood cells (RBC), where they grow adopting different morphologies from the 
early ring shape to the late trophozoite. At this stage, the parasites undergo several cellular 
division forming schizonts which release 16-32 new merozoites into the blood stream after 
erythrocytic membrane disruption [5]. 

The parasite development continuously remodels the membrane and cytoskeleton of the 
host RBC. Modification of more than 100 proteins of the host RBC proteome that could have 
an important impact on the morphology and rheological properties of the infected RBC 
(iRBC) and on malaria pathogenesis have been recently reported [6]. Most of them are 
exported to the iRBC surface. These make iRBC membrane more adhesive promoting thereby 
the cytoadherence [7]. Progressive stiffness of the cell membrane is also associated with the 
parasite grow [8]. Stiffness influences deformabiltiy. Thus, deformability is mildly reduced in 
rings (increased sphericity) and markedly altered in schizonts [9]. Seminal studies in P. 
falciparum-infected patients have established a correlation between a decreased elongation 
index of the circulating RBC population and anemia or poor prognosis [10]. The 
biomechanical properties of the uninfected RBCs (uRBC) present in parasitized blood are also 
altered in vitro and in vivo, but the mechanism and the causes are still unclear [11-14]. 
Molecular biology and proteomics are largely used to investigate the membrane structures in 
both healthy and parasitized RBC [15]. However, these techniques cannot fully explain the 
biomechanical modifications (such as elasticity, deformability and stiffness) occurring during 
malaria infection. 

There are several techniques available nowadays to measure biomechanical changes of the 
cells. They can be classified in: microfluidics and optical techniques. Microfluidics is based 
on recent progresses in micro and nanofabrication techniques and use of new materials which 
make possible to create microchannels and flows with properties similar to that of the blood 
vessels [16,17]. Exploiting the hemodynamic effects (e.g. Fahraeus effect, margination) of the 
cells in a microfluidic flow, plasma separation [18] and white blood cell (WBC) enrichment 
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[19] have been proposed. The margination effect, based on the deformability of RBCs, has 
recently been used for separation of malaria iRBCs [20]. The performance, in terms of 
accuracy, is still far from that of MSB, but can be used as an iRBCs pre-concentration 
approach. A microfabricated deformability-based flow cytometer with micro-pillars having 
suitable size, shape and gap between them, has also been proposed for the separation of 
iRBCs [21]. However, since the gap between the pillars is used to filter the cells by their size, 
the channels are rapidly obstructed during the separation process, thereby limiting the use of 
the device. 

Alternatively, optical techniques are mainly based on novel diffraction microscopy 
schemes, made possible by the development of new laser sources and fast and sensitive 
detectors. Diffraction phase microscopy (DPM) [22] has been proposed for a more accurate 
characterization of the dynamics of the cell membrane, and hence its stiffness, and the 
composition of the cell [23]. Moreover, multimodal imaging in microscopy provided by the 
combination of different techniques is a valuable tool for quantifying information about cell 
dynamics and evolutions at nanoscale range by cross-checking information between them 
[24-29]. Measuring the stiffness from the thermal fluctuations of the membrane, Park et al. 
present the comparative results for healthy and infected RBC at different erythrocytic stages 
[25]. The mean determined shear modulus varies from 14 uN/m for ring, to 72 uN/m for 
schizont, while for healthy RBC it is only 6 uN/m. 

Nevertheless, the distributions for each RBC type are characterized by large standard 
deviations and hence there is a considerable overlap between the data making difficult a clear 
separation. Considering that the membrane fluctuations at different points of the cells are 
measured accurately, it is likely that the results of this measurement be processed to extract 
more features than the stiffness only. The same method allows to obtain the refractive index 
map of the cell and hence information about the structure of RBC. However, the 
implementation of this technique is still at the lab level, requiring a relatively expensive 
instrumentation which cannot be easily transported. Moreover, only a reduced number of cells 
can be processed per unit of time. 

In this paper we propose an adjustment of the secondary speckle sensing approach 
described in Refs. [30-33] for the implementation of rapid, high rate and high accuracy 
automatic detection of malaria. The approach involves illuminating the RBCs with a tilted 
laser beam. The microscope, by properly adjusting its focus, captures time varied speckle 
patterns generated due to the thermal movement of the RBCs. This movement is analyzed via 
correlation based algorithm that extracts the change in the position and in the value of the 
correlation peak. Then, the statistics related to the position and value of the correlation peak is 
analyzed using two automated approaches: fuzzy logic based ruling and principle component 
analysis (PC A). In this paper we construct the full system as well as the automatic detection 
algorithmic and present in preliminary experimental results the potential for automatic 
detection of malaria. 

Note that basically, the main difference between the speckle based technique and 
quantitative methods in phase microscopy is the simplicity of realization or the simplicity in 
the calibration stage of the optical setup. The speckle based approach is also directly related to 
physical value. It measures directly the movements of the cells and not the phase that is 
changed due to those movements. It is also simpler to translate the movement of the cell (i.e. 
the size of movement and the direction of movement) to the change we obtain in the speckle 
pattern. It is also relatively simpler, using the proposed speckle based approach, to tune the 
sensitivity of detection, i.e. how small is the movement of the cell that still may be detected by 
this technique e.g. in the Z (axial) direction. This is done simply by changing the defocusing 
of the objective lens. Defocusing also changes the size of the speckle patterns which also 
affects the measurement sensitivity. 

The paper is constructed as follows: in Section 2 we present the optical setup, and in 
Section 3 the experimental results. Section 4 addresses the fuzzy logic based algorithm, while 
Section 5 focuses on the PCA algorithm. Section 6 concludes the paper. 
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2. Secondary speckle sensing microscopy (S M) setup 

The S 3 M setup is depicted in Fig. 1 and consists of a custom inverted microscope in which the 
sample is illuminated by a tilted laser beam (Ar + 514.5 nm, LaserPhysics, Cheshire, UK). 
Additionally, an on-axis white light fiber optic illuminator is used for reference imaging and 
alignment purposes only. Note that the tilted laser was used due to mechanical/physical 
constraints of the constructed system as well as in order to avoid direct reflections of the laser 
into the camera. The defocusing is needed in order to convert the titling movement into shifts 
since linear phase addition becomes a shift in the Fourier domain (far field approximation) 
and shifts can easily be detected by correlation based computations as those being applied in 
this paper. 



Laser 
Source 



Mkrofl iridic 
Chip 



Microscope Optics: 
Ojec(iw Lens 

+ 

Tube Lens 



Aerial Image Plane 
of the RBCs 



CMOS sens or at 
image plane of the 
unfocused Speckle 



Fiber Optic 
Illuminator 



Image of one RBC 
mi clei white tight illumination 





Image of the unfocused speckle 
under laser illumination 



Fig. 1. Secondary speckle sensing microscopy (S 3 M) optical setup. 

The sample is imaged by an objective (PL APO 100x/l water, Olympus) and a tube lens 
(achromatic doublet, f = 300 mm) onto the sensor of a CMOS camera (Fastec Hispec-4, from 
Gold Elettronica, Chiavari, Italy). After cell identification, the sample is moved several 
micrometers upside on the optical axis allowing that an unfocused image of the speckle 
pattern be recorded on the CMOS camera. As an example, Fig. 1 (right) shows the image of a 
healthy RBC under white light illumination and the image of the unfocused speckle obtained 
from the same cell under laser tilted illumination. The speckle pattern provided by the laser 
beam is recorded for 1 second at high frame rate (2000 fps). This high acquisition frame rate 
allows proper sampling of the cell membrane flickering due to the thermal vibration [24] . The 
RBCs are introduced and dragged to the observation area using a simple microfluidic chip 
with a single channel cut by a C0 2 laser in a 2 mm Plexiglas sheet and with a glass cover slip 
glued on the bottom. 

Experimental results involved tracking of speckle patterns captured for healthy and 
infected malaria RBCs. The movies were recorded moving the microfluidic chip with the cells 
upside by different distances: 2, 20 and 50 um respectively. Data processing produced similar 
results in terms of discrimination between infected and healthy cells. 
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3. Sample preparation and data analysis 

Donor blood was kindly provided by the Blood Bank Service of Azienda Ospedaliero- 
Universitaria Ospedali Riuniti di Trieste (nonprofit organization for blood donation). Citrate- 
anticoagulated blood was obtained from healthy A-positive donors after informed consent. 
Aliquots of blood were centrifuged at 2300 rpm for 13 min to remove the buffy coat and the 
erythrocyte pellet was washed three times with 5 mM phosphate-buffered saline (PBS). P. 
falciparum (W2 strain) culture was carried out according to Trager and Jensen [34] and 
maintained at 5% hematocrit (human type A-positive red blood cells) in RPMI 1640 medium 
with the addition of 1% AlbuMax, 0.01% hypoxantine, 20 mM Hepes, and 2 mM glutammine. 
Asynchronous cultures with parasitaemia of 4-5% were diluited to 0.01% final hematocrit, 
aliquoted into a petri dish and incubated at room temperature. Parasitaemia was checked by 
Giemsa staining of a blood smear, before the optical experiments. 

We have analyzed 25 cell samples: 12 corresponding with healthy RBCs (hRBC) and 13 
with RBCs infected by P. falciparum (iRBC). From the recorded speckle pattern movies, we 
have performed a digital processing in order to define a list of parameters for classifying the 
RBC samples. Essentially, the processing involves correlation between the secondary speckle 
patterns of successive frames and from this correlation operation we have extracted various 
statistical time varying parameters both regarding to the position of the correlation peak as 
well as to its value. Working with secondary speckle patterns has the advantage that the way 
their spatial distribution varies with the axial distance is very much dependent on the distance 
itself and can be estimated in high precision using simple numerical, correlation based, tools. 

We have initially defined a list of 27 parameters of interest from which we have extracted 
a list of 20 useful correlation related parameters which could also be easily numerically 
calculated. The extracted parameters are depicted in Table 1. This selection obeys to a 
significance criterion in the values obtained for each parameter and to the simplicity of their 
numerical extraction. 



Table 1. Set of the 20 Inspected Parameters and their Assigned Serial Number 



Serial 
No. 


Parameter 


Serial 
No. 


Parameter 


Serial 
No. 


Parameter 


Serial 
No. 


Parameter 


1 


amp x shift 


8 


std shift y 


15 


std speed x 


22 


mean 
correlation 


2 


amp y shift 


9 


amp x speed 


16 


std speed y 


23 


std corr 


5 


mean shift x 


10 


amp y speed 


17 


amp 
cumulative 
shift x 


25 


amp of corr 
speed 


6 


mean shift y 


13 


mean speed x 


18 


amp 
cumulative 
shift y 


26 


mean of corr 
speed 


7 


std shift x 


14 


mean speed y 


21 


amp 
correlation 


27 


std corr speed 



Serial numbers are attached to the 20 parameters and a short description of the 
corresponding parameter is given to each number. The physical meaning of these parameters 
is as follows: 



•1,2: amplitude of the displacement of the correlation peak in x and y respectively. 

• 5, 6: mean of the parameters (1, 2), respectively. 

• 7, 8: standard deviation (std) of the parameters (1, 2), respectively. 

• 9, 10: amplitude of the displacement speed in x and y respectively. 

• 13, 14: mean of the parameters (9, 10), respectively. 
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• 15, 16: standard deviation of the parameters (9, 10), respectively. 

• 17, 18: amplitude of the cumulative displacement speed in x and y respectively. 

• 21, 22, 23: amplitude, mean and standard deviation, respectively, of normalized corr- 

value (the normalization is such that the maximal correlation value was one). 

• 25, 26, 27: amplitude, mean and standard deviation, respectively, of speed of corr- value. 

Some of the 20 parameters for one of the 12 measured hRBCs can be seen in Figs. 2(a)- 
2(d) where we show the displacement of the correlation peak in X-Y plane (upper figure) and 
specify the histograms of X and Y displacements respectively (two lower charts). In Fig. 2(a) 
we talk about the relative displacement i.e. the change in position between two adjacent 
temporal frames. In Fig. 2(b) we repeat the same presentation for the speed of movement for 
the correlation peak. In Fig. 2(c) we repeat the presentation as in Fig. 2(a) but for the 
accumulative displacement, i.e. in comparison to the first frame of measurement. In Fig. 2(d) 
in the upper plot we show the correlation value versus the frame number and in the lower plot 
we show the histogram for the upper chart. 




(0 (d) 

Fig. 2. Some examples of the 20 relevant parameters corresponding to an inspected healthy 
RBC. 

Note that the difference between the computation of a given parameter or an accumulative 
value for the same parameter (e.g. the amplitude of the shift) is as follows: 

x^pos^K®^}, xr=Zpos max {s n ®s n _ 1 }, (1) 

where x n is the position of the correlation peak when correlating ((8)) the region of interest in 
frame n with the one in frame n-1 and x n cum is the cumulative shift which is the position of the 
peak when correlating frame n with frame 1 . The velocity (regular or cumulative) is computed 
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by dividing the difference in the amplitude of the shift (regular or cumulative) in units of time. 
In Fig. 3(a) we present a microscope image of an iRBC obtained under white light 
illumination and in Fig. 3(b) we show the speckle pattern with the laser illumination. 




Fig. 3. (a). Microscope image of an infected RBC. (b). Speckle pattern 




Fig. 4. Some examples of the relevant 20 parameters for one of the infected RBC. 

Similarly to Fig. 2, Figs. 4(a)-4(d) present the same parameters shown in Fig. 2 for a 
hRBC, but now considering the iRBC shown in Fig. 3. 

Comparing the parameters shown in Figs. 2 and 4 for hRBC and iRBC respectively, one 
can notice some differences, e.g. the amplitude of the cumulative displacement XY for the 
infected RBC is larger than for the healthy RBC, the range and the distribution of the 
correlation values are also different. Nevertheless, those changes are small and not always 
consistent in all the measurements and thus a more sophisticated numerical analysis tools had 
to be developed. In the next two sections we present two of such numerical tools and we 
discuss their capabilities to discriminate hRBC from the iRBC. 
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4. Malaria diagnostics via fuzzy logic 

After proper inspection of the collected data when considering a defocusing distance of 2 um 
away from the RBCs focus plane, we have constructed the fuzzy logic algorithm for 
processing the recordings made. Straight forward comparison of data between infected and 
healthy samples leads to no conclusive decision criteria whereas a set of values is associated 
with an infected or a healthy RBC, as discussed above for two samples and demonstrated for 
all the samples when considering 4 of the 20 parameters shown in Figs. 5 and 6. 

Note that defocus affects the accuracy of measurement since it affects the size of the 
generated speckle patterns. Moreover, the accuracy in estimating the Z (axial) axis movement 
strongly depends on the defocusing distance since if the distance is too large Z movement will 
not affect the distribution of the speckle patterns at all. In our case, we captured images in 
several close distances and tried to analyze all of them and eventually optimized our system 
by choosing the distance providing the best detection performance. 

Looking at the data, there's no clear Boolean rule to differentiate between healthy and 
infected cells, more so, any rule that is based on a comparison between two particular samples 
(one healthy and one infected) will not be valid for all other pairs of samples. This leads us to 
seek a flexible inference engine based on a combination of samples, and so we turn to fuzzy 
logic. Fuzzy logic, first introduced by Zadeh [35] addresses, among other things, problems 
where Boolean logic fails to give a solution due to its roughness and a more flexible logic is 
required. In fuzzy logic inferencing we first have to establish a set of rules [36], based on the 
data. 
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Fig. 5. An example of 4 parameters measured for the 12 hRBC (healthy cell samples). 
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Fig. 6. An example of 4 parameters measured for the 13 iRBC (infected cells samples) 

For this reason we need to pick part of the data as the basis of our rules. We have decided 
to calculate the average values considering the top first 5 samples for each of the 20 
parameters, for data from healthy and infected RBCs. 

The reason why 5 samples are taken is because 5 is sufficiently large that the average 
value has an actual meaning but sufficiently small to leave enough data to be manipulated by 
the fuzzy logic rules generated by these 5 samples. We then reorder the 20 parameters in 
Table 1 according to the averaged parameter values, from the biggest (left) to the smallest 
(right) as shown in Table 2 (hRBC) and Table 3 (iRBC), respectively. 
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Table 2. Parameter ordering for hRBCs from the highest to the smallest 
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Table 3. Parameter ordering for iRBCs from the highest (left) to the smallest (rig 


ht) 
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As can be seen, while several parameters remain in the same relative position for either 
healthy or infected cells (the black serial numbers), others swap places, e.g., parameter 14 has 
bigger value than parameter 26 for healthy RBCs but smaller values for infected RBCs. 

We have decided to focus on three pairs of parameters that seemed to have the greatest 
potential for differentiating between the healthy and infected cells: (18,17), (9,25) and (26,14), 
shown in boldface, in Table 4. 



Table 4. Unified parameters ordering for hRBC (first row) and iRBC (second row) and 
selection of three pairs of parameters for processing (bold) 
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According to the ordering rule shown in Table 4, hRBC cell should have larger values for 
parameter 26 than for parameter 14, larger values for parameter 9 than for parameter 25 and 
larger values for parameter 18 than for parameter 17. However, if we take a look at the partial 
data depicted in Fig. 7 and Fig. 8, representing the three pairs of selected parameter for the 
first 5 samples used to calculate the averaged values, we observe that the reordering rule does 
not fit to all the samples. For example, Sample2 from hRBC and Sample3 from iRBC have 
two pairs that do not respect the ordering rules derived by averaging the values. Note that 
except for Sample5 of healthy cells, none of the samples individually obey all the rules. This 
is the advantage of fuzzy logic, the total conclusion (as given later on) may be correct even if 
some of the rules are not met. 
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Fig. 7. Three pairs of parameters for the healthy cell samples. The first 5 samples are used to 
determine the rules. Notice that some cells do not fit the reordering rule derived by averaging. 
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Fig. 8. Three pairs of parameters for the infected cell samples. The first 5 samples are used to 
determine the rules. Notice that some cells do not fit the reordering rule derived by averaging. 

Since Boolean rules are not sufficient to make the decision whether a cell is infected or 
not, according to the three parameter-pairs chosen, we establish fuzzy logic rules that combine 
the 3 pairs of columns in a 5 x 5 x 5 3D rule matrix, like the one given in Fig. 5 [37]. Note 
that the rule matrix contains fuzzy terms such as very small, medium etc. A typical fuzzy rule 
may be "if ratio of columns 26 and 14 is small and the ratio of columns 9 and 25 is very large 
and the ratio of columns 18 and 17 is very small then the outcome is 3 (corresponding to 
large)". This rule can be seen in Fig. 9, when addressing the second column and the fifth row 
of the first (leftmost) 2D matrix. 
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Fig. 9. Fuzzy logic ruling. 

We assume that a 3 -valued vector represents the input for each cell and generates a 3D 
Gaussian being shifted according to the vector's elements. A 2D example of such a shift and 
how it is represented is given in Fig. 10. 

Then, the 6 parameters (3 pairs) for each sample are fed into the fuzzy logic inference 
engine. The result is a number indicating the grade given to the current sample. We then take 
the average of the 3 parameter-pairs for the test group and obtain a reference grade. If the 
grade of a specific sample is larger than the reference grade then we diagnose the person as a 
member of the group. This is done separately on the list of cells that are healthy and those 
which are infected. The outcome is given in Fig. 11, where cells correctly diagnosed for being 
healthy are drawn in green and cells correctly diagnose for being infected are drawn in red. 
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Fig. 10. An example of applied Gaussian weighting. 
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Fig. 11. Results for the fuzzy logic processing: (a) 8 of 12 samples are correctly diagnosed for 
being healthy (above at list one threshold line) and (b) 9 of 13 samples are correctly diagnosed 
for being infected (below at least one threshold line). 

The result is that we correctly identify 67% of the healthy cells and 69% of the infected 
cells. Note that healthy Sample2 that was previously marked for not obeying two rules is 
indeed not identified as healthy, but infected Sample3 which was previously marked for not 
obeying two rules is identified as infected, as required. 

Note that if one wishes to use fuzzy logic approach for detecting other types of cells, one 
may have to change the rules because other types of cells may be better characterized by 
different physical parameters and even if the same physical parameters are good as well, it 
might be that different range of values better express their physical properties. In order to 
properly adapt the rules to the type of problem being investigated, it is always recommended 
(as we also did in this paper) to take a few measurements as references used for the rules 
definition. Then, after properly extracting the rules, to apply the formulated mechanism over 
the rest of the measurements in order to validate the functionality of the approach for the new 
types of cells. 

5. Principal components analysis 

Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal 
transformation to convert a set of observations of possibly correlated variables into a set of 
values of uncorrelated variables called principal components [38,39]. The number of principal 
components is less than or equal to the number of original variables. This transformation is 
defined in such a way that the first principal component has variance as high as possible (that 
is, accounts for as much of the variability in the data as possible), and each succeeding 
component in turn has the highest variance possible under the constraint that it will be 
orthogonal to (uncorrelated with) the preceding components. 

Principal components are guaranteed to be independent only if the data set is jointly 
normally distributed. PCA is sensitive to the relative scaling of the original variables. Often, 
its operation can be thought of as revealing the internal structure of the data in a way which 
best explains the variance in the data. If a multivariate data set is visualized as a set of 
coordinates in a high-dimensional data space (1 axis per variable), PCA can supply the user 
with a lower-dimensional picture, a "shadow" of this object when viewed from its (in some 
sense) most informative viewpoint. This is done by using only the first few principal 
components so that the dimensionality of the transformed data is reduced. 

5.7. First analysis 

The first step was searching different behaviors of the data for each one of the three distances 
at which speckle data were recorded, i.e., at distances of 2 um, 20 um and 50 um. So the data 
was arranged on three graphs, where each one of the 20 parameters (which measure different 
data in different units) was represented by an index on the x-axis, and the value of the 
examination is given on the y-axis. Red triangles represent the infected cells while the green 
marking the healthy one. For data taken from 50 um distance, many of the parameters have 
the result as seen in Fig. 12 where one see that the healthy cells values distribute rather 
normal, while the infected ones have more spread values. 
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Fig. 12. Values of parameter 1 of healthy and infected cells for data taken at a distance of 50 
urn. 

It seems that many values of healthy cells have smaller variance than the infected ones. 
Therefore, PCA can be a very good method to recognize some discrimination between the two 
types of cells. The best discrimination has been found to occur for data collected from 
distance of 50 um. 

5.2. Data standardization 

The first step when operating the PCA on the data is normalizing it. As observed in Figs. 
13(a) and 13(b), not all variables have the same scale. The PCA calculates the covariance of 
every two variables, so operating it with non-normalized data will give an unwanted 
advantage to variables with larger scales. The x-axis is the number of variable (as previously 
noted we have 20) and the y-axis is the value it receives. One can easily see that the variable 
noted as "15" has much larger scale than others, and other variables have also many scales of 
themselves. 
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Fig. 13. (a). Parameter 15 compared to others for distance of 50 um. (b). All parameters and 
values for distance of 50 jam. 

Normalization converts the data to be with zero average and standard deviation of 1, i.e. 
for a normalized data it is needed to subtract the mean (average) and then divide it with the 
standard deviation to rescale the data. This procedure has to be done for each column (every 
variable). Let us denote by Z the new standardized matrix. In Fig. 14 one may see the 
normalized data. 

5.3 s Eigenvalues and eigenvectors 

The covariance matrix C has eigenvalues and eigenvectors, which one can calculate. Let D be 
the diagonal matrix of the eigenvalues of C, and let V be the matrix of the eigenvectors (as 
columns). To understand the meaning of these eigenvalues, it is essential to rearrange them 
decreasingly and also the eigenvectors as columns in a matrix, respectively. 

Now, each eigenvalue represents one dimension. The first (and now, the greatest) 
eigenvalue represents the first principal component — the dimension that explains more of the 
variance than others. The second one represents the second principal component, and so on. 
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Fig. 14. Normalized data. 



To explain enough of the variance, it is needed to calculate the cumulative sum of the 
eigenvalues. If D is the matrix above (with decreasing values), then for every i, 



1=1 

tr(D) 



(2) 



calculates the cumulative sum. For instance, in the 50 um distance results, the first 15 
eigenvalues of C are {9.40, 2.64, 2.43, 1.32, 1.12, 0.94, 0.75, 0.60, 0.38, 0.11, 0.09, 0.06, 
0.05, 0.02, 0.01}. It is easily seen that the sum of the first 5 values covers most of the 
cumulative sum. Indeed, their cumulative sum is 0.85. 85% seems to be accurate enough, so 
now it is needed to make a new matrix, based on the first 5 eigenvectors matching the first 5 
eigenvalues: M = [v!,v 2 ,...,v 5 ]. 

5.4 Forming the new data 

After having the matrix M above, all needed is multiplying the original normalized data 
matrix Z, with M from the right: S = Z x M. The next step is calculating the length of each 
vector (which is a point in the new 5 -dimensional space). The formula 



(3) 



gives the length of the i-th vector in the new space. The last step is finding the radius that 
separates the two populations. Plotting the length of the vectors versus index yields the graph 
of Fig. 15. In this figure the horizontal axis is the patient (12 healthy and 12 infected) and the 
vertical one is the computed p^. 




Fig. 15. Separation between infected and healthy cells. The figure presents the plot of the 
length of the vectors versus the index. 
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The green points represent healthy cells, and the red are the infected. Thus, from the last 
result one may see that there is 100% probability that the cell is infected when pi>3. There is 
83% probability (10 out of 12 patients) that the cell is healthy if pi<3. 

Note that basically there is no relation between the fuzzy logic and the PCA techniques 
that were presented in this paper (the fuzzy logic is not a prerequisite before applying PCA) 
and they are more or less orthogonal in their decision making process. However, this is 
exactly why we used both. We think that because the two approaches are very different the 
combination of both of them may result in overall better detection performance. Although in 
general the fuzzy logic did not as good as PCA but it did well in different types of cells and 
thus smart combination of both techniques in the overall decision making process may 
produce a more noise immune algorithm capable of detecting infected cells in their earlier 
stages and with higher detection probability and lower probability for false alarm. 

Also note that biomechanical properties of iRBCs are severely altered during the disease 
progress (from ring, to trophozoite, and to schizont stages), and thus flickering of iRBCs are 
different during this alteration. For example, the flickering of iRBCs in early stage (ring) is 
similar to hRBCs and the flickering of iRBCs at very late stage (schizont) is very little. In this 
study we tried to inspect RBC in different stages of the disease and indeed we also saw 
preliminary and encouraging results indicating that with the proposed approach we will be 
able not only to detect if the cell is infected but also to detect the stage of the disease. 
However, those results were less controlled (since we ourselves were also not 100% sure 
about the exact stage of the disease) and this we decided to avoid using those results in the 
preliminary proof of concept stage being presented in this paper. Our current task without a 
doubt is to extend the proposed principle and algorithms also for detecting the exact stage of 
the disease and this will be the aim of our next paper. 

6. Conclusions 

In this paper we were present a new optical technique based upon processing of speckles 
statistics in order to detect malaria. The processing was based upon construction of fuzzy 
logic based ruling and principle component analysis applied on correlation related 
manipulation of the time varying speckle patterns. Preliminary experimental results showed 
high capability of detection of infected cells (100% probability using PCA analysis). The 
combination between the fuzzy logic and the PCA analysis can provide also good results in 
detection of healthy cells (both techniques produce errors for different samples). 
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